Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Top-Rank- k frequent patterns mining algorithm based on TCM prescription database
QIN Qibing, TAN Long
Journal of Computer Applications    2017, 37 (2): 329-334.   DOI: 10.11772/j.issn.1001-9081.2017.02.0329
Abstract775)      PDF (854KB)(502)       Save

The dependency of the empirical parameters in frequent patterns mining of Traditional Chinese Medicine (TCM) prescriptions should be reduced to improve the accuracy of mining results. Aiming at the characteristics of TCM prescription data, an efficient Top-Rank-k frequent patterns mining algorithm based on Weighted Undirected Graph (WUG) was proposed. The new algorithm can directly mining frequent k-itemset (k≥3) without mining 1-times and 2-times, and then quikly backtrack to the corresponding prescription of the frequent itemsets of core drugs combination. Besides, the compression mechanism of Dynamic Bit Vector (DBV) was used to store the edge weights in undirected graph to improve the spatial storage efficiency of the algorithm. Experiments were conducted on TCM prescription datasets, real datasets (Chess, Pumsb and Retail) and synthetic datasets (T10I4D100K and Test2K50KD1). The experimental results show that compared with iNTK (improved Node-list Top-Rank-K) and BTK (B-list Top-Rank-K), the proposed algorithm has better performance in terms of time and space, and it can be applied to other types of data sets.

Reference | Related Articles | Metrics
Relational algebraic operation algorithm on compressed data
DING Xinzhe, ZHANG Zhaogong, LI Jianzhong, TAN Long, LIU Yong
Journal of Computer Applications    2016, 36 (1): 21-26.   DOI: 10.11772/j.issn.1001-9081.2016.01.0021
Abstract619)      PDF (923KB)(374)       Save
Since in the massive data management, the compressed data can be done some operations without decompressing first, under the condition of normal distribution, according to features of column data storage, a new compression algorithm which oriented column storage, called CCA (Column Compression Algorithm), was proposed. Firstly, the length of data was classified; secondly, the sampling method was used to get more repetitive prefix; finally the dictionary coding was utilized to compress, meanwhile the Column Index (CI) and Column Reality (CR) were acted as data compression structure to reduce storage requirement of massive data storage, thus the basic relational algebraic operations such as select, project and join were directly and effectively supported. A prototype database system based on CCA, called D-DBMS (Ding-Database Management System), was implemented. The theoretical analyses and the results of experiment on 1 TB data show that the proposed compression algorithm can significantly improve the performance of massive data storage efficiency and data manipulation. Compared to BAP (Bit Address Physical) and TIDC (TupleID Center) method, the compression rate of CCA was improved by 51% and 14%, and its running speed was improved by 47% and 42%.
Reference | Related Articles | Metrics